Use timely's logging infrastructure to log Tracker state #321

saradecova · 2020-03-05T13:29:19Z

Add TrackerEvent which records additions or removals
of capabilities, as well as propagation events when changes
in implications are propagated along the internal connections
and edges of the graph.

Add DebugEvent which records the state of pointstamps,
implications, and worklist of Tracker.

Enabling loggers for these events is done the same way as for
logging::TimelyEvent's

At the moment, we use these loggers for comparative testing between
Isabelle implementation of progress tracking and our Rust implementation.

This graph plots the runtime of the computation with and without the changes, and suggests
that the increase is not significant despite the changes being on the critical path.
The example used is timely/examples/barrier.rs with 10,000,000 samples, run
with four workers. The graph shows complementary cdf for:

current master,
change with all loggers disabled (almost identical to (1)),
original TimelyEvent logger, and
original TimelyEvent logger + TrackerEvent logger.

We got similar results when run on other examples.

Add TrackerEvent which records additons or removals of capabilities, as well as propagation events when chanages in implications are propagated along the internal connections and edges of the graph. Add DebugEvent which records the state of pointstamps, implications, and worklist of Tracker. Enabling loggers for these events is done the same way as for logging::TimelyEvent's.

frankmcsherry · 2020-03-05T21:14:31Z

timely/src/progress/frontier.rs

@@ -317,6 +317,13 @@ impl<T: PartialOrder+Ord+Clone> MutableAntichain<T> {
        self.frontier().less_equal(time)
    }

+    /// Clones the vector of updates.
+    /// Only used for debugging purposes.


Can you say a bit more about what this is used for in the comments? If this lands, I'll need to support it, and it would help to understand why it is here and under which circumstances it could go away.

Removed the function as it's no longer needed.

frankmcsherry · 2020-03-05T21:15:11Z

timely/src/progress/frontier.rs

+    /// Clones the vector of updates.
+    /// Only used for debugging purposes.
+    #[inline]
+    pub fn updates(&self) -> Vec<(T, i64)> {


Could this return a &[(T, i64)] instead, to avoid a mandatory allocation?

Removed the function completely as it's no longer needed.

frankmcsherry · 2020-03-05T21:19:14Z

timely/src/progress/reachability.rs

@@ -145,8 +147,9 @@ pub struct Builder<T: Timestamp> {
 impl<T: Timestamp> Builder<T> {

    /// Create a new empty topology builder.
-    pub fn new() -> Self {
+    pub fn new(path: Vec<usize>) -> Self {


I think this is a big abstraction change, in that reachability.rs used to be agnostic to the hierarchical nature of names in timely dataflow, and just did reachability tracking in a scope with no additional information. This seems to bake that in now, which .. will have to ponder whether that is a good call or not.

Another option might be to produce another implementation of reachability.rs, putting things behind a trait.

Not trying to be difficult, but attempting to minimize the complexity in an already-too-complicated bit of logic.

That's a good point. I removed the path argument from from Tracker and Builder constructor.

However, it still must be supplied when the logger is registered. The logger now looks like this:

pub tracker_logger: Option<(Vec<usize>, crate::logging::Logger<TrackerEvent>)>,

The logging events such as "a capability is added/removed from a location" do not make sense if they cannot be tracked down to a particular location.

Let me know what you think please and whether you would still prefer going your way.

frankmcsherry · 2020-03-05T21:37:32Z

I have some general questions about the PR!

I think I understand the goal, which is to get out information about the steps that reachability.rs takes as it runs. What I'm less clear on is whether this is the best way to do that. I think that what's in reachability.rs is essentially deterministic once its inputs are specified. So, if we captured the timestamp changes that are supplied as inputs, and the moments at which propagation happens, I think this would be sufficient information to re-execute the code and see what happens, but without asking reachability.rs to record this for you.

Would it serve your purposes just as well to instrument the moments and nature of pointstamp updates and propagation?

saradecova · 2020-03-09T17:29:25Z

Thank you for the comments! I changed the logging so there is only one additional logger (as apposed to two) logging the minimum information to re-execute the code as you suggested.

utaal · 2020-03-13T13:30:41Z

Hi folks, a couple of more notes from a conversation with @saradecova .

We're using Debug to convert timestamps of arbitrary types to Strings for logging:

https://github.com/TimelyDataflow/timely-dataflow/pull/321/files#diff-d233d1f563e31c39b0ad93a7182fdcb3R493

This seems reasonable (we wouldn't know how to encode the type otherwise), however this makes it harder (impossible) for the consumer to know what type to expect for a certain scope. As an example, to be able to replay the behaviour of the Tracker using a trace generated with these events, we need to parse the timestamps (to re-establish the partial order, mainly).

We're wondering if it makes sense to add (or adjust an existing) TimelyEvent to output type information for scope timestamps. This way, a consumer can determine how to parse the stringified timestamps (possibly by providing FromStr implementations).

One option would be to use TypeId (docs), which unfortunately hides its internals, and makes it hard (impossible) to parse at the receiver.

I'm still considering options, but @frankmcsherry let us know if you have opinions.

saradecova · 2020-03-13T13:48:03Z

To follow on @utaal and our conversation, we can encode the type in string using std::any::type_name.

Moreover, by adding internal_summaries to OperatesEvent it would be also beneficial to have the timestamp type for users of TimelyEvent log-stream.

In pracrise, this could be an event informing us that "A new subscope was created at address addr from the root with associated timestamp_type:

/// The creation of a `Subgraph`.
pub struct SubgraphEvent {
    pub addr: Vec<usize>,
    pub timestamp_type: String,
}

utaal · 2020-03-13T13:49:44Z

Also, it may be the right time to address these todo(s) in ProgressEvent:

timely-dataflow/timely/src/progress/broadcast.rs

Lines 68 to 69 in 06fac10

    
           messages: Vec::new(), 
        
           internal: Vec::new(),

timely-dataflow/timely/src/progress/broadcast.rs

Lines 118 to 119 in 06fac10

    
           messages: Vec::new(), 
        
           internal: Vec::new(),

Loggs an event whenever a new instance of a Subgraph is created.

frankmcsherry · 2020-05-04T20:26:59Z

I'm back to looking at this. Very sorry for the delay.

I have several spot comments, and generally think that before landing durably in timely it needs a bit more design work. In particular,

There are some use cases close to but not yet well-served by this (most interesting one from several users: which (operator, timestamp) pairs are on the frontier). I'd like to sort these out, but that doesn't have to block this.
The use of String for timestamps weirds me out. It also seems to be used for Antichain, which .. I can see how this gets you the data you need, but it can't be the best way. :D
The change to OperatesEvent is substantial; I wonder if it could be normalized out in to another stream?

I still don't have a great read on the requirements here. I apologize if my comments have been confused. My understanding is that you want to be able to extract progress information from the reachability subsystem, and I'm guessing that is to drive your work on progress stuffs. Do you need the fine-grained update information, or just the aggregate information extracted in propagate_all? Do you expect the information to be helpful outside of your work (roughly: should this be a branch, or is it valuable to have all timely users have access to this)?

I'm currently trying to reconcile this with requests other information asks about progress tracking, that I think are more about "log the state of the dataflow-wide frontier". It probably relates, and definitely has the same awkwardness around timestamps being generic.

Anyhow, I'm thinking about this now, and trying to understand which things are important to log and which are optional!

frankmcsherry · 2020-05-08T15:01:04Z

I have another ask: is there a qualitative difference for you between logging the progress updates in update_target and update_source as you do, vs perhaps logging them in propagate_all as (or just before) they are drained out of their buffers. The latter has the benefit of being a bit less noisy (we have consolidated the updates and potentially canceled some out), but it would remove visibility in to the first moments at which the tracker knows about some updates (in case that was part of your study).

EDIT: It also has the nice property that they can be logged transactionally, all at the same timestamp, which may avoid transient weirdness for folks looking at the data (I don't believe we work hard to fold in positive updates before negative updates).

This looks like a good direction to go to expose information about the state of system progress, but I bet my requirements are not your requirements (e.g. that change above is fine for me, but I don't want to do it if it breaks your reqs).

There are some stray lines added and removed; I can fix these up. Mostly I'm trying to map out "imagine this lands; what 'improvements' should be prohibited?"

frankmcsherry · 2020-05-08T15:04:56Z

timely/src/progress/subgraph.rs

+        // Perhaps log information about the creation of subgraph.
+        if let Some(l) = self.logging.as_mut() {
+            l.log(crate::logging::SubgraphEvent{
+                id: worker.index(),
+                addr: path.clone(),
+                timestamp_type: std::any::type_name::<TInner>().to_string(),
+            });
+        }


I'd love to reframe this as a TrackerEvent and have it be part of the line just up above (i.e. "tracker came in to existence"). I suspect something like Tracker::install_logger(...) could do both of those things and wrap up the abstraction well. I'm happy to do that after the fact if that works for you.

frankmcsherry · 2020-05-08T15:05:37Z

timely/src/progress/subgraph.rs

+        // double-check that child 0 (the outside world) is correctly shaped.
+        assert_eq!(self.children[0].outputs, self.inputs());
+        assert_eq!(self.children[0].inputs, self.outputs());
+


Can you explain why these moved down 20 lines?

frankmcsherry · 2020-05-08T15:07:04Z

timely/src/worker.rs

+        let (internal_summary, _) = operator.get_internal_summary();
+


If at all possible, I'd like to keep this next to the set_external_summary() call just to be clear that they are paired. I'm happy to have it hoisted as well.

frankmcsherry · 2020-05-08T15:08:13Z

timely/src/logging.rs

+    /// Internal summary for every combination of input and output port.
+    pub internal_summaries: Vec<Vec<String>>,


Can you explain what these are used for? Would it be equally beneficial to have the report from the tracker about its input-to-output summaries? That would leave this event stable and consolidate the timestamp/summary related events to the reachability tracker.

frankmcsherry · 2020-05-08T15:09:21Z

timely/src/progress/reachability.rs

@@ -597,7 +629,6 @@ impl<T:Timestamp> Tracker<T> {
        //       will discover zero-change times when we first visit them, as no further
        //       changes can be made to them once we complete them.
        while let Some(Reverse((time, location, mut diff))) = self.worklist.pop() {
-


random whitespace

frankmcsherry · 2020-05-08T15:09:28Z

timely/src/progress/reachability.rs

@@ -654,6 +685,7 @@ impl<T:Timestamp> Tracker<T> {
                };
            }
        }
+


random whitespace

frankmcsherry · 2021-01-22T12:18:21Z

Note to self: this would be great pointed at the new logging channel now introduced in #352.

@saradecova

Based on PR TimelyDataflow#321 by @saradecova

frankmcsherry · 2021-03-30T21:54:13Z

Closing in favor of #375 which borrows heavily from this. If it turns out that there is an urgent need for e.g. logging the topology information, I can certainly make that happen too.

saradecova force-pushed the extend-logging branch from e3908d5 to 6db0fbf Compare March 5, 2020 14:04

frankmcsherry reviewed Mar 5, 2020

View reviewed changes

Reply to comments

9d66121

saradecova force-pushed the extend-logging branch from 9e77029 to 9d66121 Compare March 9, 2020 17:33

Make TrackerEvent fields public.

9617929

Add SubgraphEvent to TimelyEvents

4633745

Loggs an event whenever a new instance of a Subgraph is created.

frankmcsherry mentioned this pull request May 7, 2020

Properly log progress messages #326

Closed

frankmcsherry reviewed May 8, 2020

View reviewed changes

timely/src/progress/reachability.rs

@@ -654,6 +685,7 @@ impl<T:Timestamp> Tracker<T> {

};

}

}

Copy link

Member

frankmcsherry May 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

random whitespace

frankmcsherry mentioned this pull request Nov 8, 2020

Tracking progress towards completion. #340

Open

frankmcsherry mentioned this pull request Jan 18, 2021

Actually log progress updates #352

Merged

utaal added a commit to utaal/timely-dataflow that referenced this pull request Jan 25, 2021

Log internal summaries of operators

8e5db21

Based on PR TimelyDataflow#321 by @saradecova

frankmcsherry mentioned this pull request Mar 30, 2021

Reachability logging #375

Merged

frankmcsherry closed this Mar 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use timely's logging infrastructure to log Tracker state #321

Use timely's logging infrastructure to log Tracker state #321

saradecova commented Mar 5, 2020 •

edited

frankmcsherry Mar 5, 2020

saradecova Mar 9, 2020

frankmcsherry Mar 5, 2020

saradecova Mar 9, 2020

frankmcsherry Mar 5, 2020

saradecova Mar 9, 2020 •

edited

frankmcsherry commented Mar 5, 2020

saradecova commented Mar 9, 2020

utaal commented Mar 13, 2020

saradecova commented Mar 13, 2020

utaal commented Mar 13, 2020

frankmcsherry commented May 4, 2020

frankmcsherry commented May 8, 2020 •

edited

frankmcsherry May 8, 2020 •

edited

frankmcsherry May 8, 2020

frankmcsherry May 8, 2020

frankmcsherry May 8, 2020

frankmcsherry May 8, 2020

frankmcsherry May 8, 2020

frankmcsherry commented Jan 22, 2021

frankmcsherry commented Mar 30, 2021

		let (internal_summary, _) = operator.get_internal_summary();

		/// Internal summary for every combination of input and output port.
		pub internal_summaries: Vec<Vec<String>>,

@@ @@ -654,6 +685,7 @@ impl<T:Timestamp> Tracker<T> { @@
                               };
                           }
                       }

Use timely's logging infrastructure to log Tracker state #321

Use timely's logging infrastructure to log Tracker state #321

Conversation

saradecova commented Mar 5, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saradecova Mar 9, 2020 • edited

Choose a reason for hiding this comment

frankmcsherry commented Mar 5, 2020

saradecova commented Mar 9, 2020

utaal commented Mar 13, 2020

saradecova commented Mar 13, 2020

utaal commented Mar 13, 2020

frankmcsherry commented May 4, 2020

frankmcsherry commented May 8, 2020 • edited

frankmcsherry May 8, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

frankmcsherry commented Jan 22, 2021

frankmcsherry commented Mar 30, 2021

saradecova commented Mar 5, 2020 •

edited

saradecova Mar 9, 2020 •

edited

frankmcsherry commented May 8, 2020 •

edited

frankmcsherry May 8, 2020 •

edited